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1. Introduction 

Differential equations and Markov chains are the basic models of dynamical 
systems in a deterministic and a probabilistic context, respectively. Since the 
analysis of differential equations is often more feasible and efficient, both from a 
mathematical and a computational point of view, it is of interest to understand 
in some generality when the sample paths of a Markov chain can be guaranteed 
to lie, with high probability, close to the solution of a differential equation. 

We shall obtain a number of estimates, given explicitly in terms of the Markov 
transition rates, for the probability that a Markov chain deviates further than 
a given distance from the solution to a suitably chosen differential equation. 
The basic method is simply a combination of GronwalPs lemma with martin- 
gale inequalities. The intended contribution of this paper is to set out in a 
convenient form some estimates that can be deduced in this way, along with 
some illustrations of their use. Although it is widely understood how to arrive 
at a suitable differential equation, the justification of an approximation state- 
ment can be more challenging, particularly if one has cause to push beyond the 
scope of classical weak convergence results. We have found the use of explicit 
estimates effective, for example, when the Markov chain terminates abruptly 
on leaving some domain [4], or when convergence is needed over a long time 
interval [23], or for processes having a large number of components with very 
different scales [14]. 

The first step in our approach is a choice of coordinate functions for the 
given Markov chain: these are used to rescale the process, whose values might 
typically form a vector of non-negative integers, to one which may lie close to a 
continuously evolving path. The choice of coordinate functions may also be used 
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to forget some components of the Markov chain which do not behave suitably 
and further, as is sometimes necessary, to correct the values of the remaining 
components to take account of the values of the forgotten components. This is 
illustrated in the examples in Sections 6 and 7. The behaviour of forgotten com- 
ponents can sometimes be approximated by a random process having relatively 
simple characteristics, which are determined by the differential equation. This is 
illustrated in the example in Section 5, where it is used to show the asymptotic 
independence of individuals in a large population. 

We have been motivated by two main areas of application. The first is to pop- 
ulation processes, encompassing epidemic models, queucing and network models, 
and models for chemical reactions. It is often found in models of interest that 
certain variables oscillate rapidly and randomly while others, suitably rescaled, 
are close to deterministic. It was a primary motivation to find an extension of 
our quantitative estimates which was useful in such a context. The example 
in Section 6, which is drawn from [2], shows that this is possible. The second 
area of application is the analysis of randomized algorithms and combinatorial 
structures. Here, the use of differential equation approximations has become an 
important tool. The example in Section 7 gives an alternative treatment and 
generalization of the fc-core asymptotics discovered in [19]. 

The martingale estimates we need are derived from scratch in the Appendix, 
using a general procedure for the identification of martingales associated to a 
Markov chain. We have taken the opportunity to give a justification of this 
procedure, starting from a presentation of the chain in terms of its jump chain 
and holding times. We found it interesting to do this without passing through 
the characterization of Markov chains in terms of semigroups and generators. 

The authors are grateful to Perla Sousi and to a referee for a careful reading 
of an earlier version of this paper, which has helped to clarify the present work. 

2. Survey of related literature 

There is a well-developed body of literature devoted to the general question 
of the convergence of Markov processes, which includes as a special case the 
question we address in this paper. This special case arises under fluid limit or 
law of large numbers scaling, where, for large N, jumps whose size is of order 
1/N occur at a rate of order N. This is to be distinguished from diffusive or 
central limit scaling, where jumps of mean zero and of size of order 1/\N occur 
at a rate of order N. Just as in the classical central limit theorem, a Gaussian 
diffusive limit can be used to describe to first order the fluctuations of a process 
around its fluid limit. 

Both sorts of limit are presented in the books by Ethier and Kurtz [6, Section 
7.4], Jacod and Shiryaev [8, Section IX. 4b], and Kallenberg [9]. These works 
develop conditions on the transition operators or rate kernels of a sequence 
of Markov chains which arc sufficient to imply the weak convergence of the 
corresponding processes. Trotter's paper [22] was one of the first to take this 
point of view. 
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The fluid limit is more elementary and often allows, with advantage, a more 
direct approach. One identifies a limiting drift b of the processes, which we shall 
suppose to be a Lipschitz vector field, and then the limit is the deterministic 
path obtained by solving the differential equation x = b(x). Kurtz [12] describes 
some sufficient conditions for weak convergence of processes in this context. 
Since the limit is continuous in time, weak convergence is here simply conver- 
gence in probability to of the maximal deviation from the limit path over 
any given compact time interval. Later, exponential martingale estimates, were 
used to prove decay of error probabilities at an exponential rate. See the book 
of Shwartz and Weiss [21]. This is the direction also of the present paper. Dif- 
ferential equation approximations for stochastic systems with small noise have 
been studied for many sorts of process other than Markov processes. See the 
book of Kushner and Yin [13]. 

Applications of fluid limits for Markov chains are scattered across many fields. 
See [3] on epidemiology and [21] on communications and computer networks. 
Much has been achieved by the identification of deterministic limit behaviour 
when randomized algorithms are applied to large combinatorial problems, or 
deterministic algorithms arc applied to large random combinatorial structures. 
Examples include Karp and Sipser's seminal paper [10] on maximal matchings, 
Hajek's analysis [7] of communications protocols, Mitzenmacher's [16] balanced 
allocations, and the analysis of Boolean satisfiability by Achlioptas [1] and Sc- 
merjian and Monasson [18]. A general framework for this sort of application was 
developed by Wormald and others, see [19], [24]. 

Finally, the emergence of deterministic macroscopic evolutions from microsopic 
behaviour, often assumed stochastic, is a more general phenomenon than ad- 
dressed in the literature mentioned above. We have only considered scaling the 
sizes of the components of a Markov chain. In random models where each com- 
ponent counts the number of particles at a given spatial location, it is natural 
to scale also these locations, leading sometimes to macroscopic laws governed 
by partial rather than ordinary differential equations. This is the field of hydro- 
dynamic limits - see, for example, Kipnis and Landim [11], for an introduction. 

3. Some simple motivating examples 

We now give a series of examples of Markov processes, each of which takes 
many small jumps at a fast rate. The drift is the product of the average jump 
by the total rate, which may vary from state to state. In cases where there are 
a number of different types of jump, one can compute the drift as a sum over- 
types of the size of the jump multiplied by its rate. We write down the drift and 
hence obtain a differential equation. In the rest of the paper, we give conditions 
under which the Markov chain will be well approximated by solutions of this 
equation. In each of the examples there is a parameter N which quantifies the 
smallness of the jumps and the compensating largeness of the jump rates. The 
approximations will be good when N is large. 
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3.1. Poisson process 

Take (Xt)t^o to be a Poisson process of rate XN, and set X t = X t /N . Note that 
X takes jumps of size 1/N at rate XN. The drift is then A and the differential 
equation is 

it = X. 

If we take as initial state Xq = xq = 0, then we may expect that X t stay close 
to the solution x t = Xt. This is a law of large numbers for the Poisson process. 

3.2. M N /M 1 /cx> queue 

Consider a queue with arrivals at rate N, exponential service times of mean 1, 
and infinitely many servers. Write X t for the number of customers present at 
time t. Set X t = Xt/N, then X is a Markov chain, which jumps by 1/N at rate 
N, and jumps by —1/N at rate NX t . The drift is then 1 — x and the differential 
equation is 

x t = 1- Xt- 

The solution of this equation is given by xt = 1 + XQe~ l , s ° we ma y expect that, 
for large N the queue size stabilizes near N, at exponential rate 1. 

3.3. Chemical reaction A + B <-»■ C 

In a reversible reaction, pairs of molecules of types A and B become a single 
molecule of type C at rate X/N, and molecules of type C become a pair of 
molecules of types A and B at rate fi. Write A t ,B t ,Ct for the numbers of 
molecules of each type present at time t. Set 

X t = (Xl,Xf,Xf) = {A t ,B t ,C t )/N, 

then X is a Markov chain, which makes jumps of (— 1,— 1, 1)/JV at rate 
(X/N)(NXt)(NX?), and makes jumps of (l,l,-l)/iV at rate n(NX'f). The 
drift is then (fix 3 — Xx x x 2 , fix 3 — Xx x x 2 , Xx x x 2 — fix 3 ) and the differential equa- 
tion is, in components, 

x t = fix t — Xx t x t , i t = fix t — Xx t x t , it — ^ x t x t — ^ x t ■ 

Any vector (a; 1 , a; 2 , x 3 ) with fix 3 = Xx 1 x 2 is a fixed point of this equation and 
may be expected to correspond to an equilibrium state of the system. 

3.4. Gunfight 

Two gangs of gunmen fire at each other. On each side, each surviving gunman 
hits one of the opposing gang randomly, at rate a for gang A and at rate (3 
for gang B. Write A t and B t for the numbers still firing on each side at time 
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t. Set X t = (XtjXf) = (A t ,B t )/N , then X is a Markov chain and jumps by 
(0,-l)/JV at rate aNXj, and by (-1,0)/N at rate f3NX?. The drift is then 
(—/3a; 2 , —ax 1 ) and the differential equation is, in components, 

x t = —ax t , x t = —(3x t . 

Note that, in this case the parameter N docs not enter the description of the 
model. However the theory will give an informative approximation only for 
initial conditions of the type (A t , B t ) = N(ao, bo). The reader may like to solve 
the equation and see who wins the fight. 

3.5. Continuous time branching processes 

Each individual in a population lives for an exponentially distributed time of 
mean 1/N, whereupon it is replaced by a random number Z of identical off- 
spring, where Z has finite mean \i. Distinct individuals behave independently. 
Write Xt for the number of individuals present at time t. Set X t = X t /N , then 
X is a Markov chain, which jumps by (k — l)/iV at rate NX t V(Z = k) for all 
k e Z+. The drift is then J2 k (k ~ 1 ) P ( Z = k ) x = ( u ~ l ) x and the differential 
equation is 

x t = (fi- l)x t . 

This equation gives a first order approximation for the evolution of the popu- 
lation size - in particular, it is clear that the cases where fi < 1, fx = 1, u > 1 
should show very different long-time behaviour. 

4. Derivation of the estimates 

Let X = (X t )t^o be a continuous- time Markov chain with countable 1 state- 
space S. Assume that in every state £, € S the total jump rate is finite, 
and write q(^, £') for the jump rate from £ to for each pair of distinct states 
£ and We assume that X does not explode: a simple sufficient condition for 
this is that the jump rates are bounded, another is that X is recurrent. 

We make a choice of coordinate functions x l : S —> R, for i = 1, . . . , d, and 
write x = (x 1 , . . . , x d ) : S — ► M d . Consider the R d - valued process X = (X t )t^o 
given by X t = (X} , . . . , Xf) = x(X t ). Define, for each £ £ S, the drift vector 

where we set /?(£) = oo if this sum fails to converge absolutely. 

lr The extension of the results of this paper to the case of a general measurable state-space 
is a routine exercise. 
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Our main goal is the derivation of explicit estimates which may allow the 
approximation of X by the solution of a differential equation. We shall also 
discuss how the computation of certain associated probabilities can be simplified 
when such an approximation is possible. 

Let U be a subset of M. d and let xo € U. Let b : U — ► M. d be a Lipschitz 
vector field. The differential equation it = b(xt) has a unique maximal solution 
(#t)t<C' starting from xq, with x t € U for all t < £. Maximal here refers to £ 
and means that there is no solution in U defined on a longer time interval. Our 
analysis is based on a comparison of the equations 

X t = X + Alt + I P(X s )ds, < t < T u 
Jo 

x t — x + / b(x s )ds, ^ t ^ Ci 
Jo 

where T\ = inf{i ^ : f3(X t ) = oo} and where the first equation serves to define 
the process (M t )o^Ti- 

4-1- L 2 -estimates 

The simplest estimate we shall give is obtained by a combination of Doob's 
L 2 -incquality and Gronwall's lemma. Doob's L 2 -incquality states that, for any 
martingale (M t ) t ^ to , 

Efsup|Af t | 2 ) ^4E(|M t0 | 2 ). 

\t<to / 

Gronwall's lemma states that, for any real-valued intcgrable function / on the 
interval [0,in], the inequality 

f(t)^C + D f f(s)ds, for alii, (1) 
Jo 

implies that f(t ) ^ Ce Dt °. 

Write, for now, K for the Lipschitz constant of b on U with respect to the 
Euclidean norm | . | . Fix io < C an d £ > and assume that 2 , 

for all £ G S and t ^ t , |cc(£) - x t \ < e =^ G £/. 

Set S = ee~ Kt "/3 and fix A > 0. For our estimate to be useful it will be necessary 
that A be small compared to e 2 . Set T = inf{i : Xt [/}; see Figure 1. 
Define, for £ G S, 

a(£) = £KO-*(£)| 2 ?(^'). 

2 A simpler but stronger condition is to require that path (xt)t^t lies at a distance greater 
than e from the complement of U. 
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Fig 1. The unit square (0, l) 2 is a possible choice of the set U. The inner and outer solid curves 
bound a tube around the deterministic solution (red) of the ordinary differential equation, 
which starts inside U. The realization shown of the Markov chain trajectory does not leave 
the tube before exit from U. This is a realization of the stochastic epidemic, which will be 
discussed in more detail in Section 5. 



Consider the events 3 



{pTMo 
J 



and 

n 2 



r TM 



a(X t )dt < At > , 2 = j 



|/3(X t ) - b(x(X t ))\dt < 8 



T Ato^Ti and sup \M t \ < 6 

t^T/\t 



Consider the random function /(<) = sup s<t \X S —x s \ on the interval [0, TAto]. 
Then 



f(t)^\X -x \+suj>\M s \ + f \p{X s )-b(x(X s ))\ds 

s^t Jo 



\b(X s )-b{x s )\ds. 



So, on the event tto D ill n fl' 2 , f satisfies (f) with C = 35 and D = K, so 
/(T A to) ^ e, which implies T > to and hence /(to) ^ £• Consider now the 
stopping time 



A t A inf <j t > : J a{X s )ds > At 



3 In examples, we shall often have some or all of these events equal to Q. We may have 
= xo and /3 = bo x, or at least be able to show that j3 — bo x is uniformly small on 5. The 
example discussed in Subsection 6.1 exploits fully the given form of Q±, in that the integrand 
(3(Xt) — b(x(Xt)) cannot be bounded uniformly in a suitable way, whereas the integral is 
suitably small with high probability. 



Darling and N orris /Differential equation approximations for Markov chains 45 



By Cauchy-Schwarz, we have |/?(£)| 2 < q(O a (0 for a11 £ G S, so T < T x . By a 
standard argument using Doob's L 2 -inequality, which is recalled in Proposition 
8.7, we have 

E^sup|Af t | 2 ^J ^AAt . 

On O2, we have T = T A to, so \ ^2 — { su Pt<f 1-^*1 > ^} and so, by 
Chebyshev's inequality, P(il2 \ ^2) ^ ^Ato/8 2 . We have proved the following 
result, which can sometimes enable us to show that the situation illustrated in 
Figure 1 occurs with high probability. 

Theorem 4.1. Under the above conditions, 

P (sup \X t - x t \ > e ) ^ AAt /S 2 + P(fi£ U fij U fl c 2 ). 



4-2. Exponential estimates 

It is clear that the preceding argument could be applied for any norm on R d with 
obvious modifications. We shall do this for the supremum norm ||x|| = max, \xi\, 
making at the same time a second variation in replacing the use of Doob's L 2 - 
inequality with an exponential martingale inequality. This leads to the version 
of the result which we prefer for the applications we have considered. It will be 
necessary to modify some assumptions and notation introduced in the preceding 
subsection. We shall stick to these modifications from now on. We assume now 
that, e > and to are chosen so that, 

for all £ e S and t ^ t , \\x(£) - x t \\ < e => x(£) G U. (2) 

Write now K for the Lipschitz constant of b with respect to the supremum norm. 
Set 6 = ee~ Kto /3. Fix A > and set = 5/(At ). Define 



a (x) = e 6 ^ - 1 - 6\x\, x€ 



and set 



i K,fl) = ^^(. 1 ; ! ((')-^K))?K,a ^,0)=max<^,0), £ e S. 

* * I 



Consider the events 

( rT/\t 

a = {\\x - x \\ ^ 5} , toi = \J 



\/3(X t ) - b(x(X t ))\\dt <<5 



I j-TAto 

Sl 2 = < <f>(X u 6)dt < \e 2 At 
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and 



^2 = { T A t < Ti and sup ||A/ t || s£ <5 

t^TAt 



We can use Gronwall's lemma, as above, to see that on the event Qq H f2i n 0' 2 
we have sup t<to ||X t — x t || ^ e. Note that, since <jq{x) > 2 |x| 2 /2 for all x e K, 
we always have T A to ^ 7\ on • Fix i e {1, . . . , d} and set 

<K0 = E {e 9 ^^-* 4 («) - 1 - 0(x*(O - **(£))} • 
Then #0^(6 0)<0(£,0), so 



\ / /-'I'Ato 

sup A/ t 4 > (5 and fi 2 < P sup M\ > 5 and / (/>(X t )dt < |0 2 Ai o 

CTMq J y^TAt Jo 

sc exp{±0 2 At o - 05} = cxp{-(5 2 /(2At )}. 



For the second inequality, we used a standard exponential martingale inequality, 
which is recalled in Proposition 8.8. Since the same argument applies also to 
— M and for all i, wc thus obtain P(fi 2 \fi 2 ) < 2de~ 5 ' 2 / { ~ 2Ata \ Wc have proved the 
following estimate, which is often stronger than Theorem 4.1. In an asymptotic 
regime where the sizes of jumps in X arc of order 1/N but their rates are of order 
N, the estimate will often allow us to prove decay of error probabilities in the 
differential equation approximation at a rate exponential in N. The price to be 
paid for this improvement is the necessity to deal with the event 0,2 just defined 
rather than its more straightforward counterpart in the preceding subsection 4 . 

Theorem 4.2. Under the above conditions, 

P (sup||X t -x t \\ > e] < 2der s2/{2Ato) + P(^o U 0£ U fi!j). 



4-3. Convergence of terminal values 



In cases where the solution of the differential equation leaves U in a finite time, 
so that C < oo, we can adapt the argument to obtain estimates on the time 
T that X leaves U and on the terminal value X?- The vector field b can be 
extended to the whole of M. d with the same Lipschitz constant. Let us choose 
such an extension, also denoted b, and write now (xt)t^o for the unique solution 
to it = b(xt) starting from xq. Define for e > 

Q = mi{t ^ : x £ U for some x e R d with \\x - x t \\ ^ e}, 
C £ + = inf{i ^ : x U for all x G M. d with \\x - x t \\ ^ e}. 

4 The present approach is useful only when the jumps of X have an exponential moment, 
whereas the previous L 2 approach required only jumps of finite variance. In many applications, 
the jumps are uniformly bounded: if J is an upper bound for the supremum norm of the 
jumps, then, using the inequality e 1 — 1 — x ^ ^x 2 e x , a sufficient condition for f22 = f2 is that 
A > Q J 2 exp{<5J/ (Ato)}, where Q is the maximum jump rate. 
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and set 5 

p(s) = sup{\\x t -x c \\ :(- <t^C e + }- 

Typically we will have p(e) — > as e — > 0. Indeed, if U has a smooth boundary 
at X£, and if 6(x^) is not tangent to this boundary, then p(e) ^ Ce for some 
C < oo, for all sufficiently small e > 0. However, we leave this step until we 
consider specific examples. Assume now, in place of (2), that e and to are chosen 
so that to > £+. On f^onfii nfl' 2 , we obtain, as above, that /(TAio) ^ e, which 
forces £~ < T < £+, and hence ||X T -x c || ||X T -xt|| + \\xt -x^\\ < e + 
We have proved the following estimate 6 . 

Theorem 4.3. Under the above conditions, 

P(\\X T -x ( \\>e + p(e))^p(Tt [(-,(?} or sup ||X t - x t \\ > e 

^ 2de~ s2/(2Ato) + p(n c u u n§). 

4-4- Random processes modulated by the fluid limit 

We return now to the case where io < C an d condition (2) holds. Although the 
results given so far can be interpreted as saying that X is close to deterministic, 
there are sometimes associated random quantities which we may wish to un- 
derstand, and whose behaviour can be described, approximately, in a relatively 
simple way in terms of the deterministic path (x t )t<^t - To consider this in some 
generality, suppose there is given a countable set / and a function y : S — > / 
and consider the process Y = {Y t )t^o given by Y t = y(X t ). Define, for £ £ S 
and y £ I with y ^ j/(£), the jump rates 

7&tf) = E 

£'£S:i/(«')=J/ 

We now give conditions which may allow us to approximate Y by a Markov 
chain with time-dependent jump rates, which are given in terms of the path 
(xt)t^.t an d a non-negative function g on U x {(y,y') £ I X I : y ^ y'}. Set 
9t{y, y') = g(xt,y, y') for t < t . Fix I C J and set 

k = sup sup E IffC^iJ/)?/') — 5(^*5 2/j 2/01- 

tsjio ||x-x t ||<E,i/6/o y /-^, y 

5 The function p depends on the choice of extension made of b outside U, whereas the 
distribution of \\Xt ~ x (\\ does not. This is untidy, but it is not simple to optimise over 
Lipschitz extensions, and in any case, this undesirable dependence of p is a second order effect 
as e — * 0. 

6 The same argument can be made using, in place of fj^, the times 

£- = ln f{t > : sc(£) U for some {eS with ||a:(£) - x t \\ ^ e}, 
(+ = inf{t > : as(f) ^ ^ for all ^ e S with ||sc(f) - x t \\ «: e}. 
This refinement can be useful if we wish to start X on the boundary of U. 
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Set T = inf{i ^ : X t £ U or Y t I }, fix G > 0, and define 

r-ToMo 



h(X t , y) - g(x{X t ),y{X t ), y)\dt < Gt 



Theorem 4.4. There exists a time-inhomogeneous Markov chain (j/t)t<t j with 
state-space I and jump rates gt{y,y'), such that 



P I sup || X t — x t \\ > e or Y t ^ yt for some t < r 

^(G + K)t + 2de- s2/i2Ato} + p(n c uniun c 2 u 0%), 

where r = mi{t : yt ^ Iq} A to- 
Proof. We construct the process (X t ,yt)t^t as a Markov chain, where the rates 
are chosen to keep the processes (Y t )t^t and (yt)t^t together for as long as 
possible. Define for t ^ to, and for £, £' S 5 and y, y' € /, with (£, y) ^ (£', y'), 
first in the case y = y(£), 



qt(Z,y,e,y') 



q(^e){i^(gt(y,y')/j(Z,y'))}, 

q(^C){i-(9t(y,y(e))h(CMmr 

{gt{y,y')~i{W)} + , 

o, 



Xy' = y(t;') = y(0, 
Xy' = y(e)^y(®, 
if y' = y^y(C% 
if = e, 

otherwise, 



then in the case y ^ y(£)i 

qt(£,,y;£,y' 



9t(y,y') 
o, 



if 2/' = J/i 

if C = e, 

otherwise. 



Consider the Markov chain (X t ,yt)t^t on Sxl, starting from (Xo, y(Xo)), with 
jump rates qt(£,, y, 2/')- ft i s straightforward to check, by calculation of the 
marginal jump rates, that the components (X t )t<^t and (yt)t^t are themselves 
Markov chains, having jump rates <?(£,£') and <?t(y,y') respectively. Set 

f = m{{t^0:Yt^yt}At Q , 

then To > and, for t < to, the hazard rate for To is given by pit, 
where 

p(t,e,y) = 2 l7(£,y')-5 t (y,2/)l- 

Thus, there is an exponential random variable E of parameter 1 such that, on 
{To < to}, 



E 



p{t,X u Y t )dt. 
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On fl H ^1 n Cl' 2 n f2 3 we know that sup 4 ^ t \\X t — x t \\ < £ so, if also T < r, 
then Tq ^ Tq and so 




p(t,X t ,Y t )dt^(G + K)t . 



Hence P(T < r and Q n Oi n n Q 3 ) < P(.E < (G + n)t Q ) < (G + «)i , which 
combines with our earlier estimates to give the desired result. □ 

There are at least two places where the basic argument used throughout this 
section is wasteful and where, with extra effort, better estimates could be ob- 
tained. First, we have treated the coordinate functions symmetrically; it may be 
that a rescaling of some coordinate functions would have the effect of equalizing 
the noise in each direction. This will tend to improve the estimates. Second, 
Gronwall's lemma is a blunt instrument. A better idea of how the perturbations 
introduced by the noise actually propagate is provided by differentiating the 
solution flow to the differential equation. Sometimes it is possible to show that, 
rather than growing exponentially, the effect of perturbations actually decays 
with time. These refinements are particularly relevant, respectively, when the 
dimension d is large, and when the time horizon to is large. We do not pursue 
them further here. 



5. Stochastic epidemic 

We discuss this well known model, see for example [3] , to show in a simple con- 
text how the estimates of the preceding section lead quickly to useful asymptotic 
results. The stochastic epidemic in a population of size TV is a Markov chain 
A' = (X t )t^Q whose state-space S is the set of pairs £ = £ 2 ) of non-negative 
integers with £ x + £ 2 ^ N. The non-zero jump rates, for distinct G S, are 
given by 

(f *\ = if£' = £+(-l,l), 

[ ' \i*e, if? =£+(0,-1). 

Here A and \i are positive parameters, having the interpretation of infection and 
removal rates, respectively. Write X t — (£j,£ t 2 ). Then £j represents the number 
of susceptible individuals at time t and £ t 2 the number of infective individuals. 
Suppose that initially a proportion p € (0, 1) of the population is infective, the 
rest being susceptible. Thus Xq = (N(l — p),Np). The choice of jump rates 
arises from the modelling assumption that each susceptible individual encoun- 
ters randomly other members of the population, according to a Poisson process 
and becomes infective on first meeting an infective individual; then infectives 
are removed at an exponential rate p. By a linear change of timescale we can 
reduce to the case p = 1, so we shall assume that p = 1 from this point on. 
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5.1. Convergence to a limit differential equation 

Define x : S -> M 2 by x(£) = £/AT and set X t = a;(X t ). Then the drift 
vector is given by /?(£) = 6(a;(£)), where 6(a;) = (— Ax 1 x 2 , Ax 1 x 2 — x 2 ) and 
<f>(£,9) = a g (l/N)(A^^ 2 /N + £ 2 ). Take U = [0, l] 2 and set a; = The 
differential equation i t = b(xt), which is written in coordinates as 

• 1 _ _ \ 1 2 • 2 _ \ 1 2 2 

*t — ■ c t ) ~~ * x t x t x t i 

has a unique solution (xt)t^o, starting from xq, which stays in U for all time. 
Note that x(S) C J7, so condition (2) holds for any e > and to- The Lipschitz 
constant for b on U is given by K = A + A V 1. Set A = (1 + A)e/N and take 
<5 = e~ °s/3 and 6* = 5/(Ato), as in Section 4. Let us assume that e ^ to, then 
5$ N, so a 8 [l/N) sC ±(0/AO 2 e (as in Footnote 4) and so 

TAto 

(j)(X t ,6)dt Ncr g (l/N)(A + l)t ^ ±6 2 At . 

Hence, in this example, f^o = Q\ = = and from Theorem 4.2 we obtain 
the estimate 

P (sup \\X t - x t \\ > e) < 4e- JVs2 / c ', (3) 

\t<t / 

where C = 18(A + l)ioe 2Kto+1 . Figure 2 illustrates a realization of the process 
alongside the solution of the differential equation. 




Fig 2. The graphic shows the proportions of susceptible and infective individuals in a pop- 
ulation of 1000, of which initially 900 arc susceptible and 100 arc infective. The parameter 
values are A = 5 and fj, = 1. One realization of the Markov chain, and the solution of the 
differential equation, are shown at 1 : 1000 scale. 



5.2. Convergence of the terminal value 

The estimate just obtained, whilst giving strong control of error probabilities 
as N becomes large, behaves rather poorly as a function of to- This is because 
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we have used the crude device of Gronwall's lemma rather than paying closer 
attention to the stability properties of the differential equation i t = b(xt). In 
particular, the estimate is useless if we want to predict the final size of the 
epidemic, that is to say, the proportion of the population which is eventually 
infected, given by = lim^oo Xf, where Xf = 1 — X} — Xf. However, we can 
obtain an estimate on X^ by the following modified approach. Let us change the 
non-zero jump rates by setting <?(£,£') = <z(£,£')/£ 2 , f° r £ = (C 1 ,^ 2 ), to obtain a 
new process (X t )t^o- Since we have changed only the time-scale, the final values 
and X^ have the same distribution. We can now re-run the analysis, just 
done for X, to X. Using obvious notation, we have b(x) = (—Xx 1 ,Xx 1 — 1) 
and 0(£,0) = cr e (l/iV)(A^ 1 /^ + 1). We now take U = (0,1] 2 . The Lipschitz 
constant K is unchanged. We make the obvious extension of b to IR 2 . By explicit 
solution of the differential equation, we see that (xt)t^o leaves U at time r, with 
x^. = 1 — x\ — x 2 = t, where r is the unique root of the equation 

r + (1 -p)e- XT = 1. 

Moreover b 2 (x T ) = Xx\ — 1 < 0, so b(xr) is not tangent to the boundary, and 
so e + p(e) ^ Ce for all e G (0, 1] for some C < oo depending only on A and p. 
We can therefore choose to > r and apply Theorem 4.3 to obtain, for a constant 
C < oo of the same dependence, for all e € (0, 1] and all N, 

F(\XI-t\ >e) <4e- 7Ve2 / c . 
5.3. Limiting behaviour of individuals 

We finally give an alternative analysis which yields a more detailed picture. Con- 
sider a Markov chain X = (X t ) t ^o with state-space S consisting of iV-vectors 
V = i 7 ] 1 > ■ ■ • j ^7^) with 77 J € {1, 2, 3} for all j. Each component of rj represents 
the state of an individual member of the population, state 1 corresponding to 
susceptible, state 2 to infective, and 3 to removed. The non-zero jump rates, for 
distinct r], vj G S, are given by 

i\ I -^£ 2 ( 7 ?)/-^i if T) f = n + e , for some j with rf = 1. 
q(r], n ) = < . . 

II, if rj = i] + ej for some j with if =2. 

Here = \{j ■ V 3 = »}|, * = 1,2, and e 3 = ef ] = (0, . . . , 1, . . . , 0) is 

the elementary N- vector with a 1 in the jth position. Set X t = £(X t ). Then 
X = {X t ) t >Q is the stochastic epidemic considered above. Define x : S M 2 by 
x(rj) = x(£(r))). Then x(X t ) = x{X t ) = X t , which we already know to remain 
close to xt with high probability when N is large. 

We can now describe the limiting behaviour of individual members of the 
population. Fix k G {1,...,N} and set / = {l,2,3} fc . Define y : S -> / by 
y{rj) = (rj 1 , . . . , ij k ) and set Y t = y(X t ). We seek to apply Theorem 4.4. Define 
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for x e U and n, n' € {1, 2, 3} 

Ax 2 , if n = 1 and n' = 2, 
3o(^) n , n') = < 1, if n = 2 and n' = 3, 
0, otherwise, 

and, for y,y' £ /, set g(x,y,y') = 9o{x,y J ,y' J ). Then the jump rates for 

Y are given by 7(77, y) = g(x(n),y(r)),y), so we can take G = and O3 = 0, 
and it is straightforward to check that, if Iq = I, then k = fcAe. Hence there is 
a time-inhomogcneous Markov chain (yt)t^o with state-space / and jump rates 

9t(y, y') = g(x t , y, y'), y, y' e I, such that 



P(y t ^ y t for some i < i ) < fcAet + 4e 



-Ne 2 /C 



A roughly optimal choice of £ is y/C log N/N, giving a constant C < 00, de- 
pending only on A and to, such that 



F(Y t ^ y t for some t^t Q )^ C'k^/logN/N 
for all sufficiently large N. Note that the components of (yt)t^o are independent. 



6. Population processes 

The modelling of population dynamics, involving a number of interacting species, 
is an important application of Markov chains. A simple example of this was al- 
ready discussed in Section 5. We propose now to consider another example, of a 
model which has been used for the growth of a virus in a cell. Our primary aim 
here is to show how to deal with a Markov chain where some components, the 
slow variables, can be approximated by the solution to a differential equation but 
others, the fast variables, instead oscillate rapidly and randomly. Specifically, by 
a non-standard choice of coordinate functions, we can obtain an approximation 
for the slow variables, with computable error probabilities. 

A population process is a Markov chain X = (X t ) t y>o, where the state X t = 
(££,...,£") describes the number of individuals in each of n species at time i; the 
dynamics are specified by a choice of rates \ e<e i for each of the possible reactions 
(e,e'), where e,e' € (Z + )™; then, independently over reactions, X makes jumps 
of size e 1 — e at rate 




The sort of analysis done below can be adapted to many other population pro- 



6.1. Analysis of a model for viral replication and growth 

We learned of this model from the paper [2] , which contains further references on 
the scientific background. There are three species G,T and P which represent, 
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respectively, the genome, template, and structural protein of a virus. We denote 
by ^ £ 2 , £ 3 the respective numbers of molecules of each type. There are six 
reactions, forming a process which may lead from a single virus genome to a 
sustained population of all three species and to the production of the virus. We 
write the reactions as follows: 

G^T, T^>0, T^T + G, 

t^Ut + p, P^0, g + p^U$. 

Here, a>l,R^l,N^l and X,n,iy > are given parameters and, for 
example, the third reaction corresponds, in the general notation used above, to 
the case e = (0, 1, 0) and et — (1,1,0), with A £iE ' = R, whereas the final reaction, 
which causes jumps of size (—1,0,-1), occurs at a total rate of is^^/N. We 
have omitted some scientific details which are irrelevant to the mathematics, 
and have written when the reaction produces none of the three species in the 
model. In fact it is the final reaction G + P which gives rise to the virus itself. 
In the case of scientific interest, a, A, fj,, v are of order 1, but R, N are large. We 
therefore seek an approximation which is good in this regime. 

As a first step to understanding this process, we note that, for as long as the 
number of templates £ 2 remains of order 1, the rate of production of genomes 
is of order R. On the other hand, for as long as the number of genomes £\ is 
bounded by xR, for some x > 0, the number of templates can be dominated 7 
by a M/M/oq queue, (Yt)t^o with arrival rate XxR and service rate Rj ex. The 
stationary distribution for (Y t )t^o is Poisson of parameter Area, which suggests 
that, for reasonable initial conditions at least, £ 2 does remain of order 1, but 
oscillates rapidly, on a time-scale of order 1/R. The number of proteins £ 3 evolves 
as an M/M/ac queue, with time-dependent arrival rate RN$ and service rate 
R/fi+v^l/N. This suggests that £ 3 /N will track closely a function of the rapidly 
oscillating process £ 2 . 

The only hope for a differential equation approximation would thus ap- 
pear to be the genome process (£{)t>o- The obvious choice of coordinate map 
x (£) = £ X /-R gives as drift 

flo = J2 wo - x (0)q(£,, o = -a| + e X^, 

eve 

which we cannot approximate by a function of £c(£) unless the second and third 
terms become negligible. In fact they do not, so this choice fails. The problem 
is that the drift of ^ is significantly dependent on the fast variables £ 2 and £ 3 . 
To overcome this, we can attempt to compensate the coordinate process so that 
it takes account of this effect. We seek to find a function x on the state-space 
S = {Z+f of the form 

*(0 = ^ + x(0, 

7 The obvious additivity property for arrival rates of M/M/oo queues having a common 
service rate extends to the case of previsible arrival rates. A good way to sec this is by 
constructing all queues from a single Poisson random measure 
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where x is a small correction, chosen so that the drift vector /?(£) has the form 

A(0 



R 



where again A(£)/i? is small when R is large. Small here refers to a typical 
evaluation on the process, where we recall that we expect £,} / R, £t and £f/N 
to be of order 1. It is reasonable to search for a function \ which is affinc in £} 
and linear in (£ 2 ,£ 3 ). After some straightforward calculations, we find that 

1L2 ee e C 2 



X{0 = ^ -^---a^v-i 

has the desired property, with 

b(x) = X(a — l)x — Xafj.vx 2 

and 

A =Xau— f- + aXuvtt 2 + 1)— - uv£, 2 — - anvtt 2 ) 2 + anv 2 — f 2 — 
' RN P ^ ' R M ^ N P ^ ' * R^ N 
d d ftl i t3 _ 1 1 / el 

+ — jj - A (« - l ) R x(0 + A<w [ 2R x(0^ + RxiO' 

The limit differential equation 

x t = X(a — l)xt — Xct[ivx 2 

has a unique positive fixed point Xqo = (a— l)/(a[iv). Fix xq g [0, Xoq] and take 
as initial state Xq = (Rxo, 0, 0). 

Theorem 6.1. For all to G [l,oo), there is a constant C < oo, depending only 
on a, A, fi, is, to with the following property. For all e G (0, 1] there is a constant 
Rq < oo, depending only on a, A, /i, v, to and e such that, for all R ^ Ro and 
N ^ R, we have 



sup 

t<t 



Xt 

R * 



> e ) ^ e 



-Re 2 jC 



Proof. We shall write C for a finite constant depending only on a, A, /j,,v,to, 
whose value may vary from line to line, adding a subscript when we wish 
to refer to a particular value at a later point. Fix constants a ^ 1, 7 > 
0,T ^ 1, with (a + + 1)7 ^ 1/2, to be determined later, and set A = 

a/R. Take U = [0, + 1]. As in Section 4, let us write K for the Lipschitz 
constant of b on U, and set X t = x(X t ), 5 = ee~ Kt °/3, 9 = S/(At Q ) and 
T = inf{i ^ : X t $ U}. Since ^ x t ^ x^ for all t and since e ^ 1, condition 
(2) holds. 

Consider the events 

4 = I sup £ 2 7i? and sup f t 3 s$ 7^^ 
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and 



TAtn 



tfdt < T and 



TAtn 



We refer to Subsection 4.2 for the definition of the events ^0,^1,^2- We now 
show that, for suitable choices of the constants a, 7, T, we have 0401)5 C 

For t ^ T A to, on O4, we have 



R 



RN J R 2R' 



so$/R^2( Xoo + l). 



On 1^4 (without using the assumptions Co = Co = 0), we have 



\X -xo\ = \x(X )\ = - 



e e 



Co C 2 



R N 



R 



(4) 



so, provided that C07 ^ S, we have SI 4 C Sl . 
On Sl4 n f^5 , we have 



TAto 



|i9(X t )-6(a!(X t ))|dt 



f 

R Jo 



TAt 



\A(X t )\dt 



r /-TAto / 



dt ^ -^(i + i? 7 r). 



So, provided that Ci(7r + 1/R) ^ 5, we have 0411^5 C fi^ 

For C G S with C 1 «S (zoo + C 2 7-R and £ 3 sC 7 i?iV, and for any £' G 5 
with <?(£,£') > 0, we have — £ 4 | ^ 1 and hence 



KO-*(0l 



i? 



+ x(O-x(0 



and indeed, for £' = £ ± (0, 0, 1) we have |sc(£') - C/{RN), so, using 



<7 8 (*(0 - x(0) < ~e ce / fl |x(C) - x(0|, 



we obtain, after some straightforward estimation, 



2 i? 

So, on O4 D we have 



f3 

1 + ^ 



</>(X t ,0)dt^ 



1 C6» 2 

2~R" 



TAt 



f 3 

i + et 2 + | 



2 R 
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So, provided a ^ C^Te, we have C2O/R ^ 1 and so f^4 n C Q, 2 . 

From equation (4) we obtain \£,\/R — X t \ ^ C37. Let us choose then a, 7, T 
and i? so that C 7 <5, Ci(7r + 5, a ^ C 2 r e and C37 s$ e. Then 



On the other hand, CI4 n ^5 C S7 n Q 1 n f2 2 and, by Theorem 4.2, we have 



Since 2e~ l52/(2A * o) = 2e~ ife2/c , where C = 18i e 2K '°, we can now complete the 
proof by showing that, for suitable a,7,T and i?o, for all R ^ Rq, we have 
F(ni) e~ R and P(ftg) ^ e"*. 

We can dominate the processes (£ 2 )t>o and (£t)t^o, U P to T, by a pair of 
processes Y = (Y) t ^ and = {Zt)t^o, respectively, where Y is an M/M/oo 
queue with arrival rate 2X(x oc + l)R and service rate R/a, starting from Q = 0, 
and where, conditional on Y, Z is an M/M/oo queue with arrival rate RNY t and 
service rate R/n, starting from £q = 0. We now use the estimates (5) and (6), 
to be derived in the next subsection. For T sufficiently large, using the estimate 
(6), we have P(^§) ^ e~ R for all sufficiently large R. Fix such a T and choose 
a, R sufficiently large and 7 sufficiently small to satisfy the above constraints. 
Finally, using the estimate (5), ^ e~ R , for all sufficiently large R. □ 

The initial state (Rxq, 0, 0) was chosen to simplify the presentation and is not 
realistic. However, an examination of the proof shows that, for some constant 
7 > 0, depending only on a, A, fj,, v, e, the same conclusion can be drawn for any 
initial state (Rxq, £q, £q) with xo ^ £00, £0 ^ Rj and £g < RNj. Since typical 
values of the fast variables £ 2 and £j? are of order 1 and N respectively, this is 
more realistic. Although we are free to take an initial state (1,0,0), the action 
of interest in this case occurs at a time of order log R, so is not covered by our 
result. Instead, there is a branching process approximation for the number of 
genomes, valid until it reaches Rxq, for small xq. Our estimate can be applied 
to the evolution of the process from that time on. See [2] for more details of the 
branching process approximation. 

6.2. Some estimates for the M/M/oo queue 

We now derive the fast variable bounds used in the proof of Theorem 6.1. They 
are based on the following two estimates for the AI/AI/oo queue. 

Proposition 6.2. Let (X t )t^o be an M/M/oo queue starting from x$, with 
arrival rate A and service rate \x. Then, for all t ^ 1/u and all a ^ 3Xe 2 /u, 
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Proof. By rescaling time, we reduce to the case where u = 1. Also, we have 
X t ^ xo + Y t , where (Y t )t^o is an M/M /oo queue starting from 0, with the same 
arrival rate and service rate. Thus we are reduced to the case where xq = 0. 

Choose n G N so that S = t/n G [1, 2) and note that, for k = 0, 1, . . . , n — 1, 
we have 

sup JT S < X kS + A k+1 < Y k + A k+1 , 

where A k +i is the number of arrivals in (fc<5, (fc + 1)5] and where is a Poisson 
random variable of parameter A, independent of A k+ \. By the usual Poisson tail 
estimate 8 , for all x ^ 0, 

sup X s ^ x \ ^ exp < —x log 



\fc<5<s<(fc+l)<5 y I \A^1 + 0J( 

Hence, for i ^ 1 and a ^ 3Ae 2 , 



sup Xj > log t + a 



C n 



exp {-(a + logi) log ( JL) } < exp {-a log ( JL) } 



□ 



Proposition 6.3. Let (A" t ) t ^o &e « ra M/M/oo queue starting from xq, with 
time- dependent arrival rate Xt and service rate ft. Then, for all t ^ and all 
0G [0,fx), 

E ( CXP { 8 Io XsdS }) * (^"^{^/^ 

Proof. By rescaling time, we reduce to the case where fx = 1. Consider first the 
case where A t = 0. Then 



X s ds = Si H h S^o 

where SVi is the service time of the nth customer present at time 0. The result 
follows by an elementary calculation. 

Consider next the case where xo = 0. We can express X t in terms of a Poisson 
random measure m on [0, oo) x [0, 1] of intensity X t dtdu, thus 



X t = / l {u<e -(ts) } m(ds,du). 
Jo Jo 



Then, by Fubini, 

ft /■! 



X«ds ^ 



/ f log f— J m(ds, du). 
Jo Jo W 



8 For a Poisson random variable X of parameter A, we have ¥(X ^ x) ^ exp{— x log( j^)}, 
for all i>0. 
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By Campbell's formula, 

e ( cxp W ' 1 1 iog GO m(ds ' = cxp I W ■ 

The result for xo = follows. The general case now follows by independence. □ 

Now let Y = (Y t )t^o be an M/M/oo queue with arrival rate XR and service 
rate R/a, starting from Ry, and, conditional on Y, let (Z t )t^o be an M/M/oo 
queue with arrival rate RNY t and service rate starting from RNz. By 

Proposition 6.2, for any to ^ and 7 > 0, we can find Rq < 00, depending only 
on a, 7, A, /i and to 7 such that, for all Rq ^ R and N ^ 1, 

sup y t ^ (7 + y)i? or sup Z t > (7 + y + z)i?iV J < e -H . (5) 

On the other hand, using Proposition 6.3, for 6a ^ 1/2, 

, (-K ir *}) < (^)'«'{^} <w 

for a constant C depending on a, A, to- Then, by conditioning first on Y, we 
obtain, for all 9(fi + l)a < 1/2, 

( { R9 r tc 

Ejexp^ — / Z t dt 

RN R»0 



where C depends on a, A, io- Thus, we obtain constants T, Rq < 00, depending 
only on a, A, /Lt, to, such that, for all R > Ro and TV > 1, 

P F t dt ^ r(l + y) or ^ Z t dt ^ T(l + y + z)ivj s$ e~ fl . (6) 



7. Hypergraph cores 



The approximation of Markov chains by differential equations is a powerful tool 
in probabilistic combinatorics, and in particular in the asymptotic analysis of 
structures within large random graphs and hypcrgraphs. It is sometimes possi- 
ble to find an algorithm, whose progress can be described in terms of a Markov 
chain, and whose terminal value gives information about the structure of inter- 
est. If this Markov chain can be approximated by a differential equation, then 
this may provide an effective means of computation. We shall describe in detail 
an implementation of this approach which yields a quantitative description of 
the fc-core for a general class of random hypergraphs. Here k 2 is an integer, 
which will remain fixed throughout. 
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7.1. Specification of the problem 

Let V and E be finite sets. A hypergraph with vertex set V and edge-label set 
E is a subset 7 of V x E. Given a hypergraph 7, define, for v £ V and e E E, 
sets 7(v) =70 ({v} x E 1 ) and 7(e) = 7 n (V x {e}). The sets 7(e) are the 
(hyper) edges of the hypergraph 7. Figure 3 gives two pictorial representations 
of a small hypergraph. The degree and weight functions d 7 : V — > Z + and 
w? 7 : _E — > Z + of 7 are given by d 7 (u) = 17(f) I an d «? 7 (e) = |7(e)|. The k-core 
7 of 7 is the largest subset 7 of 7 such that, for all v E V and e E E, 

d 7 (u) G {0} U {fc, k + 1, . . . }, tu 7 (e) G {0, «J 7 (e)}. 

Thus, if we call a sub-hypergraph of 7 any hypergraph obtained by deleting 
edges from 7, then 7 is the largest sub-hypergraph of 7 in which every vertex of 
non-zero degree has degree at least k. It is not hard to see that any algorithm 
which deletes recursively edges containing at least one vertex of degree less than 
k terminates at the fc-core 7. The fc-core is of interest because it is a measure of 
the strength of connectivity present in 7; see [17], [19], [20]. 

A frequency vector is a vector n = (rid : d G Z + ) with n<j E Z + for all d. 
We write m(n) = ^2 d dnd- Given a function d : V — ► Z + , define its frequency 
vector n(d) = (rid(d) : d G Z + ) by nd(d) = |{« G : d(v) = d}\, and set 
m(d) = m(n(d)) = d(v). The frequency vectors of a hypergraph 7 are then 
the pair p(7), 9(7), where ^(7) = n(d 7 ) and 9(7) = n(to 7 ). Note that m(p(7)) 
is simply the cardinality of 7, as of course is m(q( r ))). 

The datum for our model is a pair of non-zero frequency vectors p, q with 
m(p) = 771(g) = m < 00. Note that there exists an integer L 2 such that 

= 9(« = for all d, w L + 1. We assume also that po = go = 0. This 
will result in no essential loss of generality. Fix an integer N ^ 1. We shall 
be interested in the limit as N — > 00. Choose sets V and E and functions 
d : V — > Z + and w : E ^ Z + such that n(d) = A^p and n(w) = A^g. In 
particular, this implies that |V| = Nj^dPd an< i l-^l = N^2 w Qw- Denote by 
G(d, id) the set of hypergraphs on V x E with degree function d and weight 
function u;. Thus 

G(d, w) — {7 C V x E : d 7 = d, u; 7 = to} 

and, in particular, all elements of G(d, k;) have cardinality Nm. This set is 
known to be non-empty for sufficiently large. Its elements can also be thought 
of as bipartite graphs onVUE with given degrees. We shall be interested in the 
distribution of the fc-core F when T is a hypergraph chosen uniformly at random 
from G(d, w). We write V ~ U(d, w) for short. Set D = df and W = Wf- These 
are the degree and weight functions of the fc-core. Define for d,d',w ^ 

Pd,d> = \{v G V : d(v) = d',D(v) = d}\/N, Q w = \{e G E : W(e) = w}\/N. 

(7) 

Note that, given (Pd.d' ■ k ^ d d') and (Q w : w ^ 1), we can recover the other 
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non-zero frequencies from the equations 

Po,d' =Pd> - Pd,d', Qo = g w - Qw, 

k^d^d' w^sO u>>1 

and, given all these frequencies, the joint distribution of T and its fc-core T is 
otherwise dictated by symmetry 9 . For D and W arc independent and uniformly 
distributed, subject to the equations (7) and to the constraint W(e) G {0, w(e)} 
for all e £ E. Moreover, we shall see that, given D and W, T ~ U (D, W). The 
problem of characterizing the distribution of the the fc-core thus reduces to that 
of understanding the frequencies (Pd,d' '• k ^ d ^ d!) and (Q w : w ^ 1). 

7.2. Branching process approximation 

In this subsection we describe an approximation to the local structure of a 
hypergraph T ~ U{d, w) on which the later analysis relies, and which is il- 
lustrated in Figure 3. We work in a more general set-up than the sequence 
parametrized by N just described. Fix L < oo and degree and weight func- 
tions d 1 it?, with m(d) = m(w) = m. We consider the limit m — > oo subject to 
d,w ^ L. Note that this limit applies to the set-up of the preceding subsec- 
tion, where m(d) = Nm with m fixed and N — * oo. Choose a random vertex v 
according to the distribution d/m and set D = d(v). Enumerate randomly the 
subset T(v) — {(v, ei), . . . , (i>, e^i)} and set Si — w(ei) — 1, i = 1, . . . , D. For 
i = 1, . . . , D, enumerate randomly the set of vertices in T(ei) which arc distinct 
from v, thus 

r(ei) = . . . jUi.sJ x {&;}, 

and set Lij = d(v.ij) — 1. Write A for the event that the vertices Vij are all 
distinct. Thus 

A = {(V, e^), («', Cj) G r implies u' = v or i = j}. 

Let T be a discrete alternating random tree, having types V, with degree 
distributions p, q respectively, and having base point v of type V. Here p, q 
are the size-biased distributions obtained from p = n(d) and q = n(w) by 
pd = dpd/m,q w = wq w /m, d, w ^ 0. This may be considered as a branching 
process starting from the single individual v, which has D offspring e±, . . . , eg of 
type E, where D has distribution p] then all individuals of type E have offspring 
of type V , the numbers of these being independent and having distribution <r; 
all individuals of type V have offspring, of type E, the numbers of these being 

9 For the marginal distribution of the fc-core, only the frequencies 
P d = \{v G V : D(v) = d}\/N = p d,d> 

d'^d 



are relevant, but the asymptotics of P d turn out to split naturally over a" , see (11) below. 
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independent and, with the exception of v, having distribution A. Here A and a 
are given by 

\d = {d+l)p d +i/m, a w = (w + l)q w+1 /m, d, w ^ 0. (8) 

For i = 1, . . . , D, write Si for the number of offspring of and, for j = 1, . . . , Si, 
write Li j for the number of offspring of the jth offspring of gj. Then, conditional 
on D = d, the random variables Si, . . . , Sd are independent, of distribution a, 
and, further conditioning on Si = Sj for i = l,...,d, the random variables 
Lij,i = 1, . . . , d, j = 1, . . . , Si, are independent, of distribution A. 

It is known (see [15] or, for a more explicit statement, [5]) that there is a 
function ipo '■ N — » [0, 1], depending only on L, with -0o( m ) ~~ ► as m — > oo, such 
that, for all degree and weight functions d,w ^ L with m(d) = m(w) = m, we 
have ¥(A) ^ 1 — -0o( m ) an d there is a coupling ofT and T such that D = D and, 
with probability exceeding 1 — ipoim), we have Si = Si for all i and Lij = L. L j 
for all 



/ 




y 4 





Fig 3. The left picture shows a hypcrgraph with eight vertices, three 2-edges, and three 3- 
edges. An incidence is selected at random, shown by the enlarged vertex, and chosen as root 
of a branching process, shown as the bottom vertex on the right. The root has two hyperedge 
offspring, shown as grey squares. One of these has two vertex offspring, and so on. 

The following paragraph presents a heuristic argument which leads quickly to 
a prediction for the asymptotic frequencies of core degrees and weights, which we 
shall later verify rigorously, subject to an additional condition. The convergence 
of r to T, near a randomly chosen vertex, which we expressed in terms of 
the function tpQ for the first two steps, in fact holds in a similar sense for any 
given numbers of steps. The algorithm of deleting, recursively, all edges in F 
containing any vertex of degree less than k terminates at the fc-core T. Consider 
the following analogous algorithm on the branching process: we remove in the 
first step all individuals of type E having some offspring with fewer than k — 1 
offspring of its own; then repeat this step infinitely often. Set go = 1. For n ^ 0, 
write s n for the probability that, after n steps, a given individual of type E 
remains in the population, and write g n +i for the probability that, after n 
steps, a given individual of type V (distinct from v) has at least k — 1 offspring 
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Fig 4. The function <j>(x) = 1 — A(l — o-(x)) is shown for x g (0, 1), where \(y) is a truncated 
Poisson probability generating function with mean 2.75, and cr(x) = 0.02 + 0.08x + 0.6x 2 + 
0.2x 3 + 0.1a: 4 . The largest intersection with the line y = x gives the value g* needed for the 
2-core fluid limit. 



remaining. Then, by a standard type of branching process argument, 

S n = er(5n), g n +l = E E ( J Xd ~ Sn ) d ~ 3 ' n^O. 
3>fc-l d 

We write here <r, and below A, for the probability generating functions 



(9) 



So g n+ i = <f>(g n ), where 



M= E E( ,)^v(9) j (i-<r(9)) d - j . 



(10) 

Note that, in the case k = 2, we have the simple formula 

«/>(<?) = 1 - A(l - tr(g)). 

Since <f> maps [0, 1] continuously to [0, 1) and is increasing, as may be verified 
by differentiation, the equation 4>{g) = g has a root in [0, 1) and g n converges 
to the largest such root g* as n — > oo. Sec Figure 4. Suppose that we accept 
the branching process as a suitable approximation to the hypergraph for the 
calculation of the core. Then we arc led to the following values for the limiting 
core frequencies: 

Pd,d' = (*)<7(ff* - a(g*)f- d Pd ', k^d^ d', 
Qw = (9*) w q w , w>l- (11) 
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We have not justified the interchange of limits which would be required to 
turn this into a rigorous argument. This seems unlikely to be straightforward. 
For, by analogy with Theorem 2.2 in [4], in the critical case when (f)(9) ^ 3 in a 
neighbourhood of g* , we would expect that, asymptotically, the core frequencies 
would take values corresponding to smaller roots of <fi(g) = g with probability 
1/2. Thus, in this case, when also the crossing condition of Theorem 7.1 fails, 
the branching process heuristic would lead to an incorrect conclusion. However, 
for certain random graphs, this sort of approach was made to work in [20]. 

7.3. Statement of result 

We return to the framework described in Subsection 7.1. Thus now n(d) = Np 
and n(w) = Nq for our given frequency vectors p and q. Define the distributions 
A and a by the equations (8) 10 . The normalized core frequencies Pd,d' an d Q w 
were defined at (7) and the limiting core frequencies pd,d' an d q w were defined 
at (11). The following result will be proved in Subsection 7.7 using a differential 
equation approximation to a suitably chosen Markov chain. 

Theorem 7.1. Assume that either g* = or the following crossing condition 
holds: 

g*=sup{g€ [0,1) : 4>{g) > g}. 

Then, for all v € (0, 1], there is a constant C < 00, depending only on p, q and 
v, such that, for all N 1, 

P ( sup \P d ,d' ~ Pd,d'\ >v or sup \Q w -q w \>v) < Ce~ N/c . 

Xks^d^d' / 

7-4- Splitting property 

A uniform random hypcrgraph T ~ U(d,w) has a useful splitting property, 
which we now describe. Given a partition V = V U V", we can identify a 
hypcrgraph h on V x E with the pair of hypergraphs h' , h" on V' x E, V" x E 
respectively, obtained by intersection. Consider the partition 

G(d,w) = U w . +w »= w G{d! >') x G(d",w"), 

where d' , d" are the restrictions of d to V', V" respectively, and where w', w" 
range over all weight functions on E subject to the given constraint. We deduce 
that, conditional on {XV ' = w',XV = w"}, the hypergraphs V and T" are 
independent, with V ~ U(d',w') and T" ~ U(d" ,w"). By symmetry, an anal- 
ogous splitting property holds in respect of any partition of E. In particular, if 
v £ V and e e E are chosen independently of T, then T \ T(v) and T \ T(e) are 
also uniformly distributed given their vertex degrees and edge weights. 

10 We have chosen for simplicity to consider a sequential limit in which these distributions 
remain fixed: the interpretation of (8) in the preceding subsection differs by a factor of N, top 
and bottom, which cancels to leave A and o independent of N. 
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7.5. Analysis of a core-finding algorithm 

Given a hypergraph 7 on V X E, set 70 = 7 and define recursively a sequence of 
hypergraphs (7„)„^o as follows: for n ^ 0, given 7„, choose if possible, uniformly 
at random, a vertex v n +i € ^ such that d = <f 7ii (u Jl+1 ) G {1, . .., k — 1} and set 

7«+i = 7n \ (7n(ei) U • • • U 7„(e d )), 

where 7 n (v n +i) = {(^n+ij e i)j • • • > ( v n+i, e d)}i if there is no such vertex, set 
7„+i = 7„. Thus we remove from 7„ all edges containing the chosen vertex 
v n+ \. The sequence terminates at the fe-core 7. 

Take T ~ U(d,w) and consider the corresponding sequence (r„) n ^o- We 
continue to write i>„ for the random vertices chosen in the algorithm. Set 
D n = c?r„ and W n = wr n ■ In the sequel we shall use the symbols j,k,l,d, d' 
to denote elements of Z + x {V}, while w will denote an element of Z + x {E}. 
This is just a formal device which will allow us to refer to two different sets of 
coordinates by £ d and and, to lighten the notation, we shall identify both 
these sets with Z + where convenient. For ^ d ^ d! and w ^ 0, set 

& d ' = \{v G V : D n («) = d,d(t;) = d'}|, C = |{e G # : W„(e) = 

Set 

£n = (# d ',£ , :<Kd<d>>0). 
Note that the process (£ n )n^o is adapted to the filtration {Tn)n^o given by 

T n = a(D r , W r : r = 0, 1, . . . , n). 

Proposition 7.2. For alln ^ 0, conditional onJ- n , we have T n ~ U(D n ,W n ). 

Proof. The claim is true for n = by assumption. Suppose inductively that the 
claim holds for n. The algorithm terminates on the .F n -measurable event 

{D n (v) G {0} U {k, k + 1, . . . } for all w G V}, 

so on this event the claim holds also for n + 1. Suppose then that the algorithm 
does not terminate at n. Conditional on J- n , u n +i and T n are independent. 
Hence, by splitting, r„ \ T n (v n+ i) is uniform given its vertex degrees and edge 
weights. Then, by a further splitting, we can delete each of the edges r n (e) with 
(i>n+i, e) G r„, still preserving this uniform property, to obtain r„ + i. Hence the 
claim holds for n + 1 and the induction proceeds. □ 

Note that the conditional distribution of v n +i given J-„ depends only on 
D n and that (D n+ i,W n +\) is a function of r„ + i, and hence is a function 
of (v n _|_i, r n ). It follows that (D n , W n ) n ^o is a Markov chain and hence, by 
symmetry, (£ n )n^o i s a i so a Markov chain. It will be convenient to denote the 
state-space by S, to define for £ G S, 
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and 

L L 

m(o = x>r, p(o = Em™ - 

w — 1 u; — 1 

and to set <?(£) = Tn(£)n(£)/l(£). Thus, £ d is the number of vertices of degree 
d, and n(£) is the number of light vertices, that is, those of degree less than k; 
/(£) is the total degree of the light vertices, and h(£) is the total degree of the 
heavy vertices; m(£) is the total weight, andp(£) is the number of ordered pairs of 
elements of £ having the same edge label. Note that, for all £ £ S, m(£) ^ Nm 
and n(£) ^ /(£), so ^ iVm. We obtain a continuous-time Markov chain 
(X t ) t j>o by taking (£ n ) n ^o a s jump chain and making jumps at rate q{X t ). As 
we saw in Subsection 5.2, in the study of terminal values, we are free to choose a 
convenient jump rate, which should, in particular ensure that the terminal time 
remains tight in the limit of interest. Our present choice will have this property. 
However, it has been chosen also so that the limiting differential equation has a 
simple form. Define now coordinate functions x ' and x w on S, for k ^ d ^ d! 
and w 1, by 

x d ' d '(£) = £ d > d '/N, x w (£) = C/N. 

Set 

X t = x(X t ) = (x d - d '(X t ),x w (X t ) : k ^ d ^ d' sC L, 1 w ^ L). 

We consider X as a process in R D , where D = |(i — k + 1)(L — k + 2) + L. We 
shall use h(x), m(x) and p(x) to denote functions of 

x = {x d - d ',x w iH^rf'ad^^ilel 5 , 

defined as for £ € S, but replacing £ d ' d and by x > and a; w respectively. 
Note that the jumps of X arc bounded in supremum norm by (k — 1)(L — 1)/N. 
Note also that h(X t ) ^ m(X t ) for all t and that the algorithm terminates at 
T = mi{t ^ : h{X t ) = m(X t )}. Hence Xx Q is the desired vector of core 
frequencies: 

Pd.d' — X To , Q w = Xt - (12) 

Recall that m = m(p) = m(q) is a given constant. We also write m(x) for 
the function on M. D just defined. Thus m = m(Xo). Let 

U a = {xeR D : x d ' d ' , x w £ [0, to] ,m(x) > 0} 

and note that a:(£) £ J7o for all £ £ 5 \ {0}. Define a vector field b on £/o by 

6 <M'/ ) = P^Lu d+1 \ x d+i,d- _ dx d,d'^ k^d^d' ^L, 
m(x) 

where x d+1,d = for k ^ d ^ L, and 

6 tu (a;) = -wx w , l^w^L. 
Define, as in Section 4, the drift vector (3 on S by 

/?(0 = X>(0-*(0)9&0- 

eve 
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Proposition 7.3. There is a decreasing function ip : N — > [0,1], depending only 
on p and q, with ij}(N) — » as N — ► oo, swc/i i/iai, /or all £ £ S with x(£) G E/oj 

- b(x(0)\\ < ^(m(0). 

Proof. Fix ( £ S and condition on £o = £. Then, for £ = 1, . . . , k — 1, we have 
d{v\) = / with probability £ /n(£). Condition further on «i = u and = / 
and write r(i;) = {v} x {ei, . . . , e;}. We use the notation of Subsection 7.2 for 
the local structure. Then we have £f — £q = — Y^i=i l{Si=w-i}, so 



fe-i 



where 

«t- w _i(^ I) = P(5i = «; - 1|& = £, = I). 
By the branching process approximation, we can find a function ip, of the re- 
quired form, such that 

mK_i(£, - wf7m(0l < ^(m(0), w = 1, . . . ,L. 

After some straightforward estimation we obtain, for the same function ip, the 
required estimate 

We turn to the remaining components. Note that \£, d ' d — £g' d | ^ (fc— 1)(L — 1). 
Recall from Subsection 7.2 the event 

A = {(V, e^), (V, ej) e r implies z/ = v or z = j}. 

Condition on Si, . . . , Si and on L^j for j = 1, . . . ,Si. On A, by symmetry, we 
have £ t d,d — £ d ' d = Z d+1 ' d — Z d d , where Z d d has binomial distribution with 
parameters £- = i Efii l{i w =d-i} and Now, 

/3 d '"' (0 = - €o' d 'ieo = 

and 

fc-l L 
1 = 1 w=l 

where 

\ d - 1 (Z,l,w-l)=F(L 1 , 1 =d-l\Z = Z,d(v 1 )=l,S 1 =w-l). 

By the branching process approximation, we can find a function ip, of the re- 
quired form, such that m¥(A°) ^ ^>(m(£)) and 

m|A d (£, - 1) - (d+ l)£ d+1 /m(£)| ^ i(L - 1)((L + 4J#(ro(0). 
Then, by some straightforward estimation, for the same function tp, 
\f3 d ' d '(0-b d ' d '(x(0)\<^(m(0). 

□ 
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7.6. Solving the differential equation 

Consider the limiting differential equation x t = b(x t ) in Uq, with starting point 
xq = Xq given by 

x a = Qui, x^' d = pd, x C Q d = 0, 1 ^ w ^ L, k ^ d < d! ^ L. 
In components, the equation is written 

xf = -wxf, u^a, 

± dd> = ?W + l)x d+Ul> _ ^d'^ k^d^d' < L. 

m(x t ) 

There is a unique solution (xt)t^o in Uo and, clearly, x™ = e~ tw q w . Then 
m(xt) = me~*cr(e~*) and p(xt) = me~ 2 *cr'(e -t ). Hence, if (rt)t>o is defined 
by 

ft = p{x t )/m(x t ), t = 0, 

then e~ T = a(e ). A straightforward computation now shows that the remain- 
ing components of the solution are given by 

d.d' I d 



** = {d) a{e ] )} Pd ' 

and that h{x t ) = m(j)(e~ t )a(e~ t ). Note that (m — h) (x t ) = cr(e _ *)(e~* — </>(e~*)), 
so g* = e _< » , where ( = inf{t ^ : m{x t ) = h(x t )}. 



7. 7. Proof of Theorem 7. 1 



Recall that the core frequencies are found at the termination of the core-finding 
algorithm, see (12). A suitably chosen vector of frequencies evolves under this 
algorithm as a Markov chain, which we can approximate using the differential 
equation whose solution we have just obtained. The accuracy of this approxi- 
mation is good so long as the hypergraph remains large. 

Consider first the case where g* = 0, when we have m(xt) > h(xt) for all 
t ^ 0. Here the hypergraph may become small as the algorithm approaches ter- 
mination, so we run close to termination and then use a monotonicity argument. 
Fix v G (0, 1], set \i = v/'i and choose to such that m(x to ) = 2/x. Define 

U = {x G Uq : m(x) > h(x) V n} 

and set 

C = inf{i > : x t £ U}, T = mf{t ^0:X t <£U}, 

as in Section 4. Since m(xt) is decreasing in t, we have Xt G U for all t ^ to- 
Hence there exists e G (0, vj (3L)), depending only on p, q and v such that 



for all £ € S and t ^ to, \\x(£) - x t \\ < e 



x(0 G (7. 
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It is straightforward to check, by bounding the first derivative, that b is Lipschitz 
on U with constant K ^ (L — l)L 3 m/ \i. Set 5 = ee~ Kt ° /3, as in Section 4. We 
have Xq = xq, so Oo = £1. By Proposition 7.3, and using the fact that m(X t ) 
does not increase, we have 

pTM 

/ \\0(X t ) - b{x(X t ))\\dt < TP(m(X T ))t = ^(Nm(X T ))t < V(Wo, 
Jo 

so 51i = n provided TV is large enough that ip(Np)to ^ 8. The total jump rate 
is bounded by Q = iVm for all ( G S. The norm of the largest jump is 
bounded by J = (k - l)(L - l)/N . Take A = QJ 2 e = {k - l) 2 (i - l) 2 me/N and 
note that 6J/(At ) ^ <5/((fc - 1)(L - l)met ) ^ 1, so A > Q J 2 exp{<5J/(Ai )}, 
and so SI2 = f2 as in Subsection 4.2, Footnote 4. On the event 
{sup t ^ to \\X t - x t || < e}, we have T > to, so 

^ dP dtd > = ?«Qu. = m(X Tn ) < m(X to ) 

k^d^d' to>l 

^ m(x to ) + |m(x to ) - m(X to )| ^ 2/i + Le ^ v. 
Hence, by Theorem 4.2, we obtain 

P sup Pd,<j' > f or sup Q w > v ) 

\k^d^d' w>l / 

< P f sup ||X t - x t || > e"U 2il e -' 52 /( 2 ' 4t0 ) < Ce"^, 

\t<t / 

for a constant C £ [1, 00) depending only on p, q and v, which is the conclusion 
of the theorem in the case g* = 0. 

We turn to the case where g* > and g* = sup{g € [0, 1) : <f)(g) > g}. Set 
now ft = |m(i<; ) and choose to > (o- Define U, C aim T as in the preceding 
paragraph, noting that C = Co- We seek to apply the refinement of Theorem 
4.3 described in Footnote 6, and refer to Subsection 4.3 for the definition of p. 
By the crossing condition, <fi(g) > g immediately below g* , so (m — h)(x t ) = 
a(e~ t )(e~ t — </>(e _t )) < immediately after ( = — logg*. We have 

\{m-h){x) - (m-h)(x')\ < C\\x-x'\\, \m{x)-m{x')\ C||x-x'||, 

for a constant C < 00 depending only on L. So, given v > 0, we can choose 
e > 0, depending only on p, q and z/, such that e + p(e) ^ ^ and C(e + p(e)) < 
^m(x£ ). Note that ||Xt — X(\\ ^ e + implies that m(Xr) > ^m(cc^) and 
hence that T = To- Define 8 and ^4 as in the preceding paragraph. Then, by a 
similar argument, provided N is sufficiently large that ip(Np)to ^ 8, we have 
f2o = Oi = O2 = f2- Hence, by Theorem 4.3, 

P ( sup |P rf)d / - p d ,d' \ > v or sup IQu, - > f J 

\k^d^d' w^l J 

= P (||X T - if II > e + p(e)) < 2£ e - 52 /( 2A *»> Ce"^ , 
for a constant C £ [1, 00) depending only on p, q and v, as required. 
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8. Appendix: Identification of martingales for a Markov chain 

We discuss in this appendix the identification of martingales associated with a 
continuous-time Markov chain X = (X t )t^o with finite jump rates. In keeping 
with the rest of the paper, we assume that X has a countable state-space, here 
denoted E, and write Q = (q(x,y) : x,y <G E) for the associated generator 
matrix. An extension to the case of a general measurable state-space is possible 
and requires only cosmetic changes. A convenient and elementary construction 
of such a process X may be given in terms of its jump chain (Y n ) n ^> and holding 
times (S n ) n ^i- We shall deduce, directly from this construction, a method to 
identify the martingales associated with A, which proceeds by expressing them 
in terms of a certain integer-valued random measure fi. There is a close analogy 
between this method and the common use of Ito's formula in the case of diffusion 
processes. The method is well known to specialists but we believe there is value 
in this direct derivation from the elementary construction. Our arguments in 
this section involve more measure theory than the rest of the paper; we do not 
however need the theory of Markov semigroups. 



8.1. The jump-chain and holding-time construction 

The jump chain is a sequence (Y n ) n ^o of random variables in E, and the holding 
times (S n ) n ^i are non-negative random variables which may sometimes take the 
value oo. We specify the distributions of these random variables in terms of the 
jump matrix IT = (n(x,y) : x,y S E) and the jump rates (q(x) : x € E), given 
by 

n ( x , y ) = f)/^*). y * x and q< ^> * °' 7r ( a ; j x ) = J ' q ^ * °' 
1 0, y 7^ x and q(x) = 0, 1 1, q{x) = 0, 

and 

q(x) = -q(x,x) = ^2q(x,y). 

Take Y = (Y n ) n ^o to be a discrete-time Markov chain with transition matrix 
II. Thus, for all n ^ 0, and all Xo, xi, . . . , x„ G E, 

P(Y = x ,Yi =x 1 ,...,Y n = x n ) = \(x )Tr(x ,xi) . . .Tr(x n -i,x n ), 

where A(a;) = P(Yq = x). Take (T„)„^i to be a sequence of independent expo- 
nential random variable of parameter 1, independent of Y. Set 



S n = T n /q(Y n -i), J = 0, J n = Si H h S n , (=7,8- 

and construct X by 



n ■ 

n=l 



Xi 
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where d is some cemetery state, which we adjoin to E. We are now using £ 
for the explosion time of the Markov chain X, at variance with the rest of the 
paper. For t ^ 0, define 

Jl(t) = inf{.s > t : X s ± X t }, Yy{t) = X Jl{t) . 

These are, respectively, the time and destination of the first jump of X starting 
from time t. Consider the natural filtration (J r t)t^o, given by 
T x = a(X s : s ^ t). Write £ for the set of subsets of E and set q{d) = 
and tt(x, B) = J2 y eB 7r ( a; ' v) for B <E £. 

Proposition 8.1. For all s,t^0 and all B £ £, we have, almost surely, 

P(Ji(t) >t + s,Y 1 (t) e B\T X ) =Ti{X t ,B)e- q{Xt)s . 

Before proving the proposition, we need a lemma, which expresses in precise 
terms that, if X has made exactly n jumps by time t, then all we know at that 
time are the states Yq, . . . , Y n , the times J\, . . . , J n and the fact that the next 
jump happens later. 

Lemma 8.2. Define Q n = cr(Y mi J m : m ^ n). For all A £ T x and all n 0, 

there exists A n £ Q n such that 

A n {J„ < t < J n+1 } = A n n{t< j n+1 }. 

Proof. Denote by At the set of all sets A £ T x for which the desired property 
holds. Then At is a er-algebra. For any s ^ t, we can write 

{X s £ B} n {J n < t < J n +i} = K n {t < J n+ i}, 

where 

A n = U%£>{Y m 6 B, J m ^s< J m +i,J n ^t}U {Y n £ B, J n ^ s}, 
so {X s £ B} £ A- Hence A = T x . □ 

Proof of Proposition 8.1. The argument relies on the memory less property of 
the exponential distribution, in the following conditional form: for s,t ^ and 
n ^ 0, almost surely, on { J n ^ t}, 

P(J„+i > t + s|0 n ) = P(T n+1 > g(F n )(s + t - J n )|S„) 

= e - qi Y n )(s+t-J n ) = e - q (Y n )s p{Jn+i > f \g n y 

Then for B £ £ and iejf, we have 

P(Ji(f) >t + s,Y 1 (t) eB,A,J n ^t< Jn+i) =P(Jn+i > t + s,y„+i £ 5,i„) 
= E(7r(r n , B)e-«^»)'l^ n{Jn+i>t} ) = E(7r(X t , S)e^ (Xt) n An{J ^ t<J „ +l} ) 

and 

P(Ji(t) >i+s,Yi(t) £ B,A,t> =6 B (B) =E(ir(X t ,B)e- q{Xt]s l An{t>c} ) 
On summing all the above equations we obtain 

F(J 1 (t)>t + s,Y 1 (t) £ B,A) = E(ir(X u B)e~^ s l A ), 
as required. □ 
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8.2. Markov chains in a given filtration 

For many purposes, the construction of a process X which we have just given 
serves as a good definition of a continuous-time Markov chain with generator Q. 
However, from now on, we adopt a more general definition, which has the merit 
of expressing a proper relationship between X and a general given filtration 
(•7"t)t>o- Assume that X is constant on the right, that is to say, for all t 0, 
there exists e > such that X s = X t whenever t ^ s < t + e. Set Jo = and 
define for n ^ 0, 

Y n = X Jn , J n+1 = inf{t ^ J n : X t ± XjJ. (13) 

For t ^ 0, define J\{t) and Y\{t) as above. Assume that X is minimal, so that 
X t = d for all t ^ £, where £ = sup„ J„. Assume finally that X is adapted to 
(•7"t)t>o- Then we say that X is a continuous-time (Ftjt^o- Markov chain with 
generator Q if, for all s,t ^ 0, and all B € E, we have, almost surely, 

P(Ji(t) >t + s,F 1 (i) G =^(X t ,B)e- 9(Xt)s . 

The process constructed above from jump chain and holding times is constant 
on the right and minimal and we do recover the jump chain and holding times 
using (13); moreover by Proposition 8.1, such a process is then a continuous-time 
Markov chain in its natural filtration. The defining property of a continuous- 
time Markov chain extends to stopping times. 

Proposition 8.3. Let X be an {F)^®- Markov chain with generator Q and let 
T be a stopping time. Then, for all s ^ and B G £ , on {T < oo} ; almost 
surely, 

P(Ji(T) >T + s,Y 1 (T) G B\T T ) =ir{X T ,B)e- q{XT)s . 

Proof. Consider the stopping times T m = 2~ m |~2 m T] . Note that T m | T as 
m — > oo so, since X is constant on the right, Xx m = Xt, J\{T m ) = Ji{T) and 
Yi(T m ) = Yi(T) eventually as m — > oo, almost surely. Suppose A £ Tt with 
A C {T < oo}. Then for all k £ Z+, A n {T m = fe2"™} G ^ 2 -m, so, almost 
surely, 

P(Ji(T m ) > T m + S , Yi(T m ) G B, A, T m = fc2- m ) 

= E(^(X Tm ,i?)e-^^) s l An{Tm=fe2 - m} ), 

and, summing over k, 

P(Ji(T TO ) > T m + s, Yi(T m ) G B,A) = E(7r(X Tm ,B)e-«(- Y ^) s U). 



Letting to — > oo, we can replace, by bounded convergence, T m by T , thus proving 
the proposition. □ 
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8.3. The jump measure and its compensator 

The jump measure fi of X and its compensator v arc random measures on 
(0, oo ) x E, given by 

oc 

t-.Xt^Xt- n=l 

and 

u(dt,B) = q(X t -,B)dt = q(X t -)ir(X t -,B)dt, B G £. 

Recall that the previsiblc er-algebra V on x (0, oo) is the er-algcbra generated 
by all left-continuous adapted processes. Extend this notion in calling a function 
defined on f2 x (0, oo) x E previsible if it is V £§> f-measurable. 

Theorem 8.4. Let H be previsible and assume that, for all t 0, 
E f [ \H(s,y)\u(ds,dy) < oo. 

JO JE 

Then the following process is a well-defined martingale 

M t = I H(s,y)(u-u)(ds,dy). 

J(0,t]xE 

Define measures ft and V on V <g> £ by 

Ji{D) = E((j,(D)), 9(D)=E(u(D)), D^V®£. 

We shall show that ft = v. Once this is done, the proof of Theorem 8.4 will be 
straightforward. For n ^ 0, define measures ft n and v n on V ® £ by 

fl n (D)=p(D n(J n ,J n+1 ]), i? n (D)=9(Dn(J n ,J n+1 }), DeV®£, 

where flfl (J„, J n +i] = {{uj,t,y) G -D : J„(w) < t ^ J n+ i(w)}. Then, since 
q(d) = 0, 

oo oo 
n=0 n=0 

so it will suffice to show the following lemma. 
Lemma 8.5. For all n ^ 0, we have /2„ = v n - 

Proof. The proof rests on the following basic identity for an exponential random 
variable V of parameter q: 

F(V^s) = qE(VAs), s^O. 

Let T be a stopping time and let S be a non-negative .^-measurable random 
variable. Set U = (T+S)/\Ji(T). By Proposition 8.3, we know that, conditional 



Darling and N orris /Differential equation approximations for Markov chains 73 

on Tt-i Ji{T) and Y\(T) are independent, J\(T) — T has exponential distribution 
of parameter q(Xt) and Y\(T) has distribution tt(Xt, ■)■ From the basic identity, 
we obtain 

P(Jr(T) - T < 5|^ T ) = g(X r )E((Ji(T) - T) A S|.F T ) = g(X T )E([/ - T|^ T ). 

Fix n ^ 0, i < u, A G J" t , B 6 £ and set D = A x (i, u] x B. The set of such 
sets I? forms a 7r-system, which generates the cr-algcbra V <E> £■ We shall show 
that jln(D) = v n {D) ^ 1. By taking A = £1, B = E, t — and letting u — > oo, 
this shows also that /2„ and P„ have the same finite total mass. The lemma will 
then follow by uniqueness of extension. 

Take T = J n At V J n+1 and 5 = (u - T)+l {T<Jn+l} . Then 

U = (T + S) /\ J\ (T) = J„ A it V J„+i. 

So 

= /2(£> R (J„, J„+i]) = P( Ji(T) < T + 5, Y 1 {T) e B, A) 
= E(l x P(Ji(T) - T < 5, Fi(T) g B|Jt)) 
= E(Ug(X T )7r(X T , B)E(U - T|JT T )) 

= E(l A g(X T , £)([/ — T)) = E U A q(X s , B)ds^J 

= v(D n (J„, J„+i]) = i/ n (fl), 

as required. □ 

Proof of Theorem 8.4- For a non-negative previsible function if, for s ^ i and 
A G ^-" s , by Fubini's theorem, 



E(u/ H(r,y)fx(dr,dy)) = [ Hdft 

V J(s,t]xE J J Ax(s,i\xE 

= / HdD = Ell A H(r,y)is{dr,dy) 

■>Ax(s,t]xE V J(s,t]xE 



>Ax(s,t]xE \ J(s,t]xE 

So, taking A = f2 and s = 0, if 

E/ / iJ(r, y)v(dr, dy) < oo 
Jo Je 

for all i ^ 0, then M 4 is well-defined and intcgrablc and, now with general s $J i 
and Aef„ 

E((M t -M a )l A )=0. 
The result extends to general previsible functions H by taking differences. □ 
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8. 4- Martingale estimates 

Theorem 8.4 makes it possible to identify martingales associated with X in a 
manner analogous to Ito's formula. We illustrate this by deriving three martin- 
gales M, N and Z associated with a given function / : E — > R. The processes 
M and N depend, respectively, linearly and quadratically on /, whereas Z is an 
exponential martingale. We use N and Z to obtain quadratic and exponential 
martingale inequalities for M, which are used in the main part of the paper. 
We emphasise that / can be any function. In the main part of the paper, we 
work with the martingales associated with several choices of / at once. In this 
subsection we do not burden the notation by registering further the dependence 
of everything on /. The discussion that follows has a computational aspect and 
an analytic aspect. The reader may wish to check the basic computations before 
considering in detail the analytic part. We note for orientation that, in the sim- 
ple case where the maximum jump rate is bounded and where / also is bounded, 
then there is no explosion and M, N and Z, as defined below, are all martin- 
gales, without any need for reduction by stopping times. For simplicity, we make 
an assumption in this subsection that X does not explode. A reduction to this 
case is always possible by an adapted random time-change - this can allow the 
identification of martingales in the explosive case by applying the results given 
below and then inverting the time-change. We omit further details. 
For all t € [0, oo), we have J n < J n +i for some n ^ 0. Then 



f(X t ) = f(Y n ) = f(Y ) + ]T{/(W) - f(Y m )} 



f(X ) + {M-f(X.-)Mda,dy). 

J(0,t}xE 



Define 



T(x) = 52\f(y)-f(x)\q(x,y) 

yjix 

and set £i = inf{£ : r(X t ) = oo}. Define when t(x) < oo 

/%) = ^{/(y) -/(*)}<?(*, y). 

y=£x 

Then, for t € [0, oo) with t < Ci, 

{f(y)-f(X a -)Mds,dy) 

(0,t]xE 

f {f(y)-f(X s _)}q(X s _,dy)ds= f (3(X s )ds, 

JE JO 
SO 

f(X t ) = f(X ) + M t + f P(X s )ds, 
Jo 
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where t 

Mt= [ f {M-f(X a -.)}(ii-v)(ds,dv). 

JO JE 

Define, as usual, for stopping times T, the stopped process Mj = Mj-^t- 
Proposition 8.6. For all stopping times T ^ C,\, we have 

i-T 



ifsuplA/tf) <2E / T(X t )dt, 

\i<T / Jo 



and, if the right hand side is finite, then M T is a martingale. Moreover M^ 1 is 
always a local martingale. 

Proof. Let T ^ d be a stopping time, with E r(X t )dt < oo. Consider the 
previsible process 

H 1 (t,y) = {f(y)-f(X t .}l {t ^ T} . 

Then 

J(a,t]xE 

sup|M t K / \Hi(t,y)\(n + v)(dt,dy) 

t^T J(0,oo)xE 



and 



/ IH^yMd^dy) = [ r(X t )dt. 

J(0,oc)xE JO 



The first sentence of the statement now follows easily from Theorem 8.4. For 
the second, it suffices to note that, for the stopping times T n = inf{t ^ : 



IK 



r(X t ) > n} A n, we have T n f f i as n — * oo and f^ n r(X t )dt ^ n 2 , for all 
almost surely. □ 

We turn now to L? estimates, in the process identifying the martingale de- 
composition of M 2 . Note first the following identity: for t € [0, oo) with t ^ £i, 

M 2 t = 2 f M s .{f(y) - f{X„-)}{p, - y){ds, dy) 

J(0,t]xE 

{f(y)-f(X s _)} 2 »(ds,dy). (14) 

(0,t]xE 

This may be established by verifying that the jumps of left and right hand sides 
agree, and that their derivatives agree between jump times. Define 

x'^x 
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and set Qi = inf{i ^ : a(X t ) = oo}. By Cauchy-Schwarz, we have 
t(x) 2 ^ a(x)q(x) for all x, so C2 ^ Ci- F° r t G [0, 00) with t ^ (2 we can 
write, 

M t 2 =JV t + / a{X s )ds, (15) 
Jo 

where 

JV t = / H(8,y)(ji-v)(da,dy), 

J(0,t]xE 

and 

y) = 2M s _{/(y) - /(*._)} + {/(y) - /(X s _)} 2 . 
Proposition 8.7. For all stopping times T ^ C,\, we have 



E (swplMtl 2 ) < 4E / a{X t )dt. 

\*<T / Jo 



Moreover, N^ 2 is a local martingale and, for all stopping times T $J £2 wif/i 
E Jq a(Xt)dt < 00, 6oi/i M T and iV 7 " are martingales, and 



E fsup|iV t |) 5E / a(X t )<ft. 

V*<T / Jo 



Proof. Let T ^ d be a stopping time, with E J" a(X t )dt < 00, then T ^ £2- 
Consider the previsible process 

H 2 (t,y) = H(t,y)l {t ^ TATn} , 

where T n is defined in the preceding proof. Then, 



N TATn = H a (t,y)(fi-v){dt,dy) 

J(0,oo)x_E 



and 



E / \H 2 (s,y)\v(ds,dy) 

J(0,oo)xE 

<E / {2\M t \T(X t ) + a(X t )}dt^4n 4 +E a(X t )dt < 00. 
Jo Jo 

Hence, by Theorem 8.4, the process N TATn is a martingale. Replace t by T AT n 
in (15) and take expectations to obtain 

rTAT n 

E(|M TAT J 2 ) =E / a(X t )dt. 
Jo 

Apply Doob's L 2 -incquality to the martingale M TAT " to obtain 

rTAT„ 



\ rTATn 

sup \M t \ 2 < 4E / a(X t )dt. 

t^TAT„ / Jo 
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Go back to (15) to deduce 

rTAT n 



e( sup \N t \) ^5E [ a{X t )dt. 

\t^TAT n J Jo 



On letting n — > oo, we have T„ | £1, so we obtain the claimed estimates by 
monotone convergence, which then imply that M T and N T are martingales. 
Then we can let T run through the sequence T n = inf{t ^ : a(X t ) > n} | (2 
to see that is a local martingale. □ 

Finally, we discuss an exponential martingale and estimate. Define for i££ 

ct>(x) = J2Hf(y)-m)q(x,y), 

where h(a) = e a — 1 — a ^ 0, and set £* = inf{< ^ : <fi(X t ) = 00}. Since 
e a — a ^ |a| for all a e R, we have r(x) ^ (/>(x) + g(x), so < Ci- Define for 
t G [0, 00) with < C* 

Z t = exp f M t - / 



Z t = Z Q + H*(s,y)(fj,-v)(ds,dy), 

J(0.t]xE 



Then 

Z t = Z + 

'(0,t]xB 

where 

ff*( S ,2/)=Z s _{e^-^-)-l}. 

This identity may be verified in the same way as (14). Consider for n ^ the 
stopping time 

U n = inf{f ^ : «£(X t ) + r(X t ) > n} 
and note that J7„ | C* as n ~* 00 ■ 
Proposition 8.8. For a// stopping times T ^ (1, 

T 



E exp M T - J cj){X t )dt I j < 1, 



</;/ 



d, /or oH A,B£ [0, 00), 



P f supM ( > B and [ cj)(X t )dt ^ A] ^ e A - B . 

\t<T Jo J 

Moreover, Z*> is a local martingale and a supermartingale, and Z Un is a mar- 
tingale for all n. 
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Proof. Consider for n the stopping time 

V n = inf{f ^ : <f>(X t ) + r(X t ) > n or Z t > n} 

and note that V n | C* as n ~> °°- F° r all n ^ and t ^ 0, we have 

r-V n r\t 

E \H*(s,y)\v(ds,dy) < E / |^ S |{0(X S ) + r(X s )}ds «S n 2 i < oo. 

Hence, by Theorem 8.4, for all n, the stopped process Z Vn is a martingale. So 
is a local martingale, and hence is a supermartingalc by the usual Fatou 
argument. In particular, for all t > 0, we have E(Z tA ^») ^ 1, so, for all n 0, 

r- l>U n M 

E \H*(s,y)\v(ds,dy) \Z,\{<f>(X,) + r(X s )}ds < nf < oo, 

J (0,U n /\t]xE JO 

and so Z* 7 " is a martingale. If T ^ d is a stopping time, then E(Zy) ^ E(z?tac* ) 
and, by optional stopping, E(Ztac) ^ E(Zo) = 1, so E{Zt) ^ 1, as required. 
Finally, we can apply this estimate to Tb = mi{t : M t > B} A T, noting 
that Zt b e B ~ A on the set 

Oo = < sup M t > B and [ <p{X t )dt ^ A} , 
[t^T Jo J 

to obtain e B - A F(fl a ) < E(Z Ts ) < 1, as required. □ 
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